M ETHOD AND A PPARAT US FOR I M PR OVING OUTPUT S K EW 

Field of the In ve ntion 



The present invention generally relates to the field of integrated circuit devices, and more 
particularly relates to the generation of signals across an output bus of such a device. 

Background 

As the processing speeds of computer systems have continually increased, there has 
been a corresponding need for faster and faster random access memory (RAM) devices. RAM 
devices, such as dynamic random access memory (DRAM) devices, are typically used as the 
main memory in computer systems. While DRAM devices have gotten faster over the years, the 
operating speeds of DRAM devices still lag behind the operating speeds of the processors which 
access the DRAM devices. Consequently, the relatively slow access and cycle times of DRAM 
devices slow down the processors, and create bottlenecks. 

In response to the need for faster DRAM devices, synchronous dynamic random access 
memory (SDRAM) devices have been developed. SDRAM devices operate synchronously with 
the system clock which drives the processor that accesses the devices, with the input and output 
data of the SDRAM devices being synchronized to an active edge of the system clock. The 
initial SDRAM devices can be referred to as single data rate (SDR) SDRAM devices since their 
peak data rate is equal to the rate at which commands can be clocked into the devices. Single 
data rate SDRAMs are currently in widespread use. 

To provide still faster DRAM devices, double date rate (DDR) SDRAM devices have 
been developed to provide twice the memory data bandwidth of SDR SDRAMs. The term DDR 
refers to the fact that the peak data rate is twice the rate at which commands can be clocked into 
the devices. DDR SDRAM devices typically allow commands to be entered on the positive edge 
of the system clock, and allow data transfers on both the rising and falling edges of the system 
clock to provide twice as much data as a SDR SDRAM device. DDR SDRAM devices typically 
employ a 2n-prefetch architecture, in which the internal data bus is twice the width of the 
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external data bus. With this architecture, each read access cycle internal to the device provides 
two external data words, and each write access cycle internal to the device writes two combined 
exte rnal data words into the device. 

In a purely synchronous system, output data (and capture of the output data by a 
memory controller) would be referenced to a common free-running system clock. In such a 
system, the maximum data rate would be reached when the sum of the output access time and the 
flight time approaches the bit time. Although the data rates could be increased by generating 
delayed clocks for early data launch and/or late data capture, these data rates would still be 
limited because these techniques do not account for the fact that the data valid window (i.e., the 
"data eye") moves relative to any fixed clock signal due to changes in temperature, voltage or 
loading. To allow for even higher data rates, data strobe signals were added to DDR SDRAM 
devices. The data strobe signals are non-free-running signals driven by the device driving the 
data (i.e., the DDR SDRAM devices for READs, and the memory controller for WRITES). For 
RE ADs, the data strobe signal is effectively an additional output having a predetermined pattern. 
For WRITEs, the data strobe signal is used by the SDRAM device as a clock in order to capture 
the corresponding input data. 

Referring to FIG. 1, a data output timing diagram 10 for an existing DDR SDRAM 
device illustrates the relationship between the bidirectional data strobe signal and the data 
input/output signals for an exemplary READ operation (e.g., a four-word burst). In this example, 
the DDR SDRAM is assumed to be a 64 Mb x8 DDR SDRAM device available from Micron 
Technology, Inc. The CK and CK# signals represent differential system clock inputs, the DQS 
signal represents the data strobe signal, and the DQ signals represent the data input/output signals 
forming the device data bus. The DQS signal includes preamble, toggling, and postamble 
portions. The preamble portion provides a timing window for the receiving device to enable its 
data capture circuitry with a known/valid level present on the DQS signal. After the preamble 
portion, the DQS signal toggles in the toggling portion at the same frequency as the CK signal 
fo r the duration of the four-word data burst. Each high transition and each low transition of the 
DQS signal is associated with one data word, provided by the DQ signals driven by the DDR 
SDRAM device. In the postamble portion, the DQS signal goes low to indicate the end of the 
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data burst to the receiving device. Thus, as shown, the data words are transmitted at twice the 
frequency of the system clock CK. 

As illustrated in FIG. 1, the DQS signal is nominally edge-aligned with all of the DQ 
signals such that all of these output signals will transition at the output pins of the DDR SDRAM 
5 device at nominally the same time. The memory controller will then internally delay the DQS 
signal to the center of the received data eye upon capturing the data. The edge-alignment of the 
DQS and the DQ signals occurs because these output signals are all clocked out of the DDR 
SDRAM device by the same internal clock signal Ideally, the DQS and DQ signals would all be 
perfectly aligned. However, as also shown in FIG. 1, the transitions of the DQS and DQ signals 
10 include a spread or distribution in time, which is due to both static effects (e.g., internal routing 
mismatch) and dynamic effects (e.g., data pattern and simultaneously switching outputs (SSO)). 
^ Even if critical signals are properly laid out on the die (e.g., using matching trace lengths), 

| inherent differences in the package leadfingers' parasitics will contribute to this spread between 
lj| the DQS and DQ signals, which is referred to as "output skew". The output skew is specified by 
Til 5 a parameter known as t DQSQ , which is the pin-to-pin skew measured at the DQS and DQ outputs 
: y of the device (i.e., the time between the transition of the DQS signal and the last DQ data valid). 

, This skew (or |t DQSQ |) region is a region of uncertainty since at least one of the output signals is 

JH ■ not valid. It is noted that the DQS signal may transition first, last, or somewhere in the middle of 

Ci the DQ transition window. Maximum t DQSQ is currently specified as 500 psec. 

Cl20 The data word being read will be valid once the latest DQ signal in the group has 

transitioned, and will remain valid until the earliest DQ signal transitions as part of the next data 
word, or upon completion of the burst. The duration of this data valid window is specified by the 
t DV parameter, as shown in FIG. 1 . The time between the transition of the DQS signal to the first 
DQ signal going non- valid is referred to as t QH (also shown). As is suggested by FIG. 1, output 
25 skew t DQSQ has an adverse impact on data valid window t DV . In particular, a relatively long output 
skew region will cause the data valid window to be relatively short. Since the memory controller 
can only capture data during the data valid window t DV , the output skew t DQSQ will also adversely 
impact the data capture operation. 
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Thus, although the addition of data strobe signals allowed for increased data rates, the 
operating speeds of existing DDR SDRAM devices are still limited by the output skew specified 
by the t DQSQ parameter. In particular, the output skew limits the operating speed (e.g., access and 
cycle times) of DDR SDRAM devices. Therefore, it would be desirable to provide a method and 
5 apparatus for reducing skew across the output data bus of a DDR SDRAM device, thereby 

enlarging the data eye for data capture by the memory controller. It would also be desirable to 
provide a method and an apparatus for reducing skew across multiple output signals in other 
memory device types, and other integrated circuit devices. 

Summary of the Invention 

10 According to one aspect of the invention, a synchronous integrated circuit device having 

an output bus for outputting a plurality of output signals includes a clock input buffer, a delay 
line coupled to the clock input buffer, and an output circuit coupled to the delay line. The clock 
input buffer receives a system clock signal and generates a buffered clock signal. The delay line 
J;; receives the buffered clock signal and generates a delayed clock signal. The output circuit 
CM includes a plurality of output signal paths for outputting the plurality of output signals 
1 ^ synchronously with the system clock signal by using the delayed clock signal. At least one of 

□ the output signal paths includes a delay circuit and an output buffer coupled to the delay circuit. 

Each delay circuit provides a programmable delay to the delayed clock signal to generate a 
Ul unique delayed clock signal which is used for clocking an output signal into the respective output 

y-20 buffer. 

According to another aspect of the invention, a method of outputting output signals on 
an output bus of a synchronous integrated circuit device with decreased output skew includes 
receiving a system clock signal, delaying the system clock signal to generate a delayed clock 
signal, and applying the delayed clock signal to a plurality of output signal paths. In each of the 
25 output signal paths, the method includes using the delayed clock signal to output the output 

signals synchronously with the system clock signal. In at least one of the output signal paths, the 
method further includes providing a programmable delay to the delayed clock signal to generate 
a unique delayed clock signal used for clocking an output signal out from the respective output 
signal path. 
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Other aspects of the present invention will be apparent upon reading the following 
detailed description of the invention and viewing the drawings that form a part thereof. 

Brief Description of the Drawings 

5 FIG. 1 is a data output timing diagram for an existing double data rate (DDR) 

synchronous dynamic random access memory (SDRAM) integrated circuit device; 

FIG. 2 is a circuit block diagram of a DDR SDRAM device having decreased output 
skew in accordance with one embodiment of the present invention; 

FIG. 3 is a circuit block diagram of a DDR SDRAM device having decreased output 
10 skew in accordance with another embodiment of the present invention; 

f ;i FIG. 4 is a block diagram showing one embodiment of a variable delay circuit for use in 

fl each of the output paths of the DDR SDRAM device shown in FIG. 2 or FIG. 3; 

FIG. 5 is an exemplary data output timing diagram for the DDR SDRAM device of FIG. 
'S 2, wherein the variable delay element for one DQ signal is dynamically modified in order to 
fjijlS decrease the output skew between that DQ signal and the DQS signal; and 

r ;j FIG. 6 is a flowchart showing a method of decreasing output skew in an integrated 

2 circuit device which generates multiple output signals (e.g., a DDR SDRAM device). 

Ci Detailed Description 

In the following detailed description, reference is made to the accompanying drawings, 
20 which form a part hereof, and in which is shown by way of illustration specific embodiments in 
which the present invention may be practiced. These embodiments are described in sufficient 
detail to enable those skilled in the art to practice the present invention, and it is to be understood 
that the embodiments may be combined, or that other embodiments may be utilized and that 
structural, logical and electrical changes may be made without departing from the spirit and the 
25 scope of the present invention. The following detailed description is, therefore, not to be taken in 
a limiting sense, and the scope of the present invention is defined by the appended claims and 
their equivalents. 
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Referring to FIG. 2, an exemplary synchronous integrated circuit device 100 in 
accordance with one embodiment of the present invention comprises a clock input circuit 102, a 
clock delay circuit 104, and an output circuit 106. In this example, device 100 comprises a 
double data rate (DDR) synchronous dynamic random access memory (SDRAM) device having 
5 improved output skew in comparison with previous DDR SDRAM devices. This DDR SDRAM 
device may be similar to one of the DDR SDRAM devices available from Micron Technology, 
Inc., except for the features described herein. For example, the DDR SDRAM device may be 
similar to an MT46V8M8, 64 Mb, x8 DDR SDRAM device available from Micron Technology, 
Inc., except for the features described herein. This DDR SDRAM device is configured as a 2 M 
10 x 8 x 4 banks DDR SDRAM. Additional background information for the MT46V8M8 DDR 
SDRAM device is provided by its data sheet, entitled "Double Data Rate (DDR) SDRAM, 
64Mb: x4, x8, xl6 DDR SDRAM", Micron Technology, Inc., 2000, and from the article entitled 
Jj "DDR SDRAM Functionality and Controller Read Data Capture", Micron DesignLine, Vol. 8, 
^ Issue 3, 3Q99. Both of these documents are incorporated herein by reference in their entirety. 

%5 In other embodiments, the apparatus and methods for improving output skew that are 

Si disclosed herein may be used in other types of DDR SDRAM devices having other 

s configurations. Alternatively, the disclosed apparatus and methods may be used in other 
synchronous memory devices, or other synchronous integrated circuit devices, for use in 

O improving output skew for a plurality of output signals which are output on an output bus. The 

S]20 apparatus and methods are described herein in reference to a particular DDR SDRAM device for 

Hi convenience only, and the invention should not be limited to such a device. 

In one embodiment, clock input circuit 102 includes a clock input buffer 108. Clock 
input buffer 108 has an input node for receiving an external or system clock signal (XCLK) 110, 
and an output node for generating an internal or buffered clock signal (CLKIN) 112. Clock input 
25 buffer 108 provides an inherent delay having a value of A. The detailed implementation of clock 
input buffer 108 will depend on the particular application. 

In one embodiment, clock delay circuit 104 includes a delay locked loop (DLL) 114 
coupled between clock input buffer 108 and output circuit 106. DLL 1 14 is configured to 
receive buffered clock signal (CLKIN) 112 from clock input buffer 108, and to generate a 
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delayed clock signal (XDLL) 116. In the embodiment of FIG. 2, DLL 114 includes a delay line 
1 1 8, a phase detector 120, and an A + B delay model 122. Delay line 1 1 8 has an input node for 
receiving buffered clock signal (CLKIN) 1 12, and an output node for generating delayed clock 
signal (XDLL) 116. Delay line 1 1 8 is configured to delay buffered clock signal (CLKIN) 1 12 by 
an adjustable amount under the control of a DLL control signal 124 to generate delayed clock 
signal (XDLL) 116. Phase detector 120 has a first input node for receiving buffered clock signal 
(CLKIN) 1 12, a second input node for receiving an internal DLL clock signal (CLKDLL) 126, 
and an output node for generating DLL control signal 124. Phase detector 120 is configured to 
detect the phase difference between buffered clock signal (CLKIN) 1 12 and internal DLL clock 
signal (CLKDLL) 126, and to generate DLL control signal 124 based upon the phase difference. 
DLL control signal 124 is then applied to delay line 1 18 to control the amount of delay. Delay 
model 122 has an input node coupled (indirectly) to delayed clock signal (XDLL) 116, and an 
output node coupled to the CLKDLL input node of phase detector 120. Delay model 122 models 
the sum of the delays of input circuit 102 (i.e., A) and output circuit 106 (i.e., B). 

Clock delay circuit 104 is thus configured to provide a delay, having a value of C, 
which is substantially equal to the period of system clock signal (XCLK) 110 less the sum of the 
delays of input circuit 102 and output circuit 106. In other words, clock delay circuit 104 
provides a delay having a value of C = t XCLK - (A + B). By providing delay C, clock delay circuit 
104 will cause output signal transitions to appear at the outputs of device 100 in nominal 
alignment with the XCLK transitions at the input of device 100. For example, if t XCLK = 7.5 nsec 
(i.e., an XCLK frequency of 133 MHz), A = 1.5 nsec and B = 3,5 nsec, then C = 7.5 nsec - (1.5 + 
3.5 nsec) - 2.5 nsec. By providing such a delay, clock delay circuit 104 will cause the output 
signals of device 100 to transition one (1) clock cycle (i.e., 7.5 nsec) after a transition of XCLK, 
such that the output signals will be aligned with the next transition of XCLK. While delays A 
and B will vary with voltage and temperature, DLL 114 will vary the value of the delay provided 
by clock delay circuit 104 (i.e., delay C) in order to keep the output signals synchronous with 
system clock signal (XCLK) 1 1 0. 

In one embodiment, delayed clock signal (XDLL) 1 1 6 would be coupled directly to the 
input node of A + B delay model 122, and the delay provided by delay line 118 would be equal 
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to the delay provided by DLL 1 14. In another embodiment, as shown in FIG. 2, DLL 1 14 also 
includes a clock multiplexer 128 and a DQ multiplexer driver 130, which are collectively 
referred to herein as a clock driver circuit 132. Clock driver circuit 132 has an input node 
coupled to delay line 118, and a pair of output nodes 134 coupled to output circuit 106 and to A 

5 + B delay model 122. Clock driver circuit 132 is configured to receive delayed clock signal 
(XDLL) 116, to multiplex XDLL 116 into differential delayed clock signals (CLKDQ, 
CLKDQL) 136, and to drive the differential delayed clock signals to generate a rising-edge 
delayed clock signal (DLLRO) and a falling-edge delayed clock signal (DLLFO) on output nodes 
134. Clock driver circuit 132 can thus be used to meet fanout requirements for connecting the 

10 DLLR0/DLLF0 signals to output circuit 106. The total amount of delay provided by clock delay 
circuit 104 in this embodiment (i.e., C) is the delay provided by delay line 118 plus the inherent 
delay of clock driver circuit 132. 

Jli The generation of both rising-edge and falling-edge delayed clock signals (DLLRO and 

i iij DLLFO) may be advantageous for a DDR SDRAM device, where data is clocked into and out of 

4j]l5 the device on both the rising and falling edges of the system clock signal 1 10. In particular, these 

%i DLLR0/DLLF0 signals can advantageously be used to output first and second data words 

5 " synchronously with rising and falling edges of system clock signal 110. 

In other embodiments, clock delay circuit 104 includes different types of delay locked 
CI loops. For example, circuit 104 may comprise a digital DLL, an analog DLL, a continually 

r ;| 20 locked loop, a periodically calibrated delay line, etc. Further, clock delay circuit 104 may or may 
r - not include a clock driver circuit, and may or may not generate both rising-edge and falling-edge 

delayed clock signals, depending on the particular application. 

Output circuit 106 has one or more input nodes coupled to clock delay circuit 104 for 
receiving one or more delayed clock signals. The received delayed clock signal(s) may include 
25 delayed clock signal (XDLL) 1 16, or both rising-edge and falling-edge delayed clock signals 
(DLLR0/DLLF0) 134. For simplicity, the remainder of this description assumes that output 
circuit 106 receives the DLLR0/DLLF0 signals, as shown in FIG. 2. As described further below, 
output circuit 106 includes a plurality (i.e., n) of output signal paths configured to output the 
plurality of output signals synchronously with system clock signal (XCLK) 110 by using delayed 



Atty. Docket No. 303.736US1 



8 



Client Ref. No. 00-0915 



clock signals DLLRO/DLLFO. In the case of device 100 being a DDR SDRAM device, the n 
output signal paths include one output signal path for outputting a bidirectional data strobe signal 
DQS S and (n-1) output signal paths for outputting (n-1) data input/output signals DQs, in 
response to a read command. For a x8 DDR SDRAM, n would equal nine (9), and the nine 
5 output data paths would include one output signal path for the DQS signal, and eight output data 
paths for the eight DQ signals. 

In one embodiment, each of the n output signal paths of output circuit 106 includes a 
variable or programmable delay circuit 138 and an output buffer 140. Each delay circuit 138 has 
two input nodes for receiving rising-edge and falling-edge delayed clock signals 
10 (DLLRO/DLLFO) 134, and has two output nodes for generating unique rising-edge and falling- 
edge delayed clock signals (DLLROn and DLLFOn) 142. Each delay circuit 138 is configured to 
provide a programmable delay to delayed clock signals (DLLRO/DLLFO) 134 to generate unique 
Ji delayed clock signals (DLLROn/DLLFOn) 142 for the nth output signal path. In this 

embodiment, the amount of delay provided by each of the delay circuits 138 is independent of 

01 

4ii5 the amount of delay provided by any of the other delay circuits 138. Each of the unique delayed 
T\ clock signals (DLLROn/DLLFOn) 142 is applied to the output buffer 140 of that particular output 
- - ; j signal path, and is used to clock an output signal (i.e., one of the DQ signals or the DQS signal) 

O into the respective output buffer. Each output buffer 140 then provides the output signal to an 

r d oulput pad 144, which is typically connected to a pin (i.e., one of the DQ pins or the DQS pin) on 

U 120 the integrated circuit package for device 100. 

H The delays provided by delay circuits 138 are programmed to decrease output skew 

across the n output signals. Note that, if all of the delay circuits 138 (i.e., the delay circuits for 
all of the n output signal paths) were programmed to the same delay value, then the transitions of 
the DQ and DQS output signals could still include a relatively large spread in time due to factors 
25 such as internal routing mismatch, data pattern and simultaneously switching outputs, and 

inherent differences in the package leadfingers' parasitics. By independently programming each 
delay circuit 138, however, the factors contributing to output skew can be compensated for, and 
the output skew of device 100 can be reduced. 
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In one embodiment, output circuit 106 also includes delay control logic 146 for 
dynamically programming delay circuits 138 during operation of device 100. Delay control logic 
146 is in a feedback path from the DQ and DQS output signals to delay circuits 138. Delay 
control logic 146 has input nodes to receive the n output signals 148 from output buffers 140 (or 
from other nodes within the output signal paths, such as output pads 144), and has output nodes 
for generating delay control signals 150 for delay circuits 138. Delay control logic 146 is 
configured to determine output skew between the DQ and DQS signals, and to generate delay 
conlrol signals 150 so as to reduce, minimize or eliminate the skew between the DQ and DQS 
output signals 148 (or the DQ and DQS signals at pads 144). 

In one embodiment, delay control circuit 146 is configured to determine the slowest 
(i.e., worst case) DQ or DQS output signal. The delay circuit 138 corresponding to this slowest 
output signal is set to a zero or minimal delay value. Then, delay control circuit 146 detects the 
out put skew between each of the other DQ or DQS signals and the slowest output signal, and 
individually programs the delay circuit 138 corresponding to this other DQ or DQS signal based 
upon the output skew detected for that DQ or DQS signal. For example, if delay control circuit 
146 determines that the DQ3 signal is the slowest output signal, and that the output skew 
between the DQ5 and the DQ3 signals is 100 psec (i.e., DQS is 100 psec ahead of DQ3), then 
delay control circuit 146 generates the delay control signal 150 for the delay circuit 138 for the 
DQS signal so as to cause a delay of about 100 psec. This 100 psec delay of the DQ5 signal will 
cause the DQ3 and DQS signals to become aligned. Delay control signals 150 are similarly 
generated for all of the other DQ and DQS output signals. Thus, by independently controlling 
the DLLROn/DLLFOn signals 142, the output skew across all of the DQ and DQS output signals 
can be decreased, thereby enlarging the data eye for data capture by the memory controller. The 
decreased output skew also allows for increased operating speed (e.g., faster access and cycle 
times) for integrated circuit device 100. 

In another embodiment, delay control circuit 146 defines a reference output signal path, 
such as that for the DQS output signal (although any of the output signal paths may be defined as 
the reference path). The delay circuit 138 for this reference output signal path is set to a 
midpoint delay value. The midpoint delay value may be set in the middle of the delay values that 
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the delay circuit 138 is capable of providing (i.e., the 50% delay value), or may be set at some 
other point between the minimum and maximum delay values (e.g., a 25% or 75% delay). Then, 
delay control circuit 146 detects the output skew between each of the non-reference DQ or DQS 
signals and the reference output signal, and individually programs the delay circuit 138 for this 
5 non-reference DQ or DQS signal based upon the output skew detected for that DQ or DQS 

signal. The delay circuit 138 for any non-reference DQ or DQS signal slower than the reference 
signal is set to a delay value less than the midpoint delay (i.e., to speed up that non-reference 
signal), and the delay circuit 138 for any non-reference DQ or DQS signal faster than the 
reference signal is set to a delay value more than the midpoint delay (i.e., to slow down that non- 
10 reference signal). If, for example, delay control circuit 146 defines the DQS signal as the 

reference, and determines that the DQ3 signal is 100 psec slower than the DQS signal, the delay 
circuit 138 for the DQ3 signal is set to a delay value 100 psec less than the midpoint delay. If, on 
"J: the other hand, delay control circuit 146 finds that the DQ3 signal is 50 psec faster than the DQS 
'H signal, then the delay circuit 138 for the DQ3 signal is set to a value 50 psec more than the 
i : :jl5 midpoint delay. In either case, the DQ3 signal will become aligned with the DQS signal. Delay 
^| control signals 150 are similarly generated for all of the other non-reference output signals. 

fti Thus, any DQ or DQS signal can be sped up or slowed down to match any other DQ or DQS 

^ signal. Therefore, by independently controlling the DLLROn/DLLFOn signals, the output skew 

^ across all of the DQ and DQS output signals can be decreased, 

C? 

^20 It should be understood that the embodiments of delay control circuit 146 described 

N? herein are exemplary, and that other embodiments of delay control circuit 146 may be used. 

In order for delay control circuit 146 to detect output skew across the DQ and DQS 
output signals, the DQ and DQS signals should be simultaneously transitioning (e.g., both 
transitioning high, or both transitioning low). While the DQS signal of DDR SDRAM devices is 
25 defined so as to toggle during its toggling portion at the same frequency as the system clock 
signal for the duration of a read data burst, the DQ signals may or may not toggle, depending 
upon the particular data values that are being read. If, for example, the DQn signal were to 
remain at a logic 0 throughout a read data burst, or were to remain at a logic 1 throughout the 
read data burst, then delay control circuit 146 would be unable to compare transitions of the DQS 
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signal to transitions of the DQn signal, and would therefore be unable to detect the output skew 
between the DQn and DQS signals. In this case, delay control circuit 146 would be unable to 
program delay circuits 138 during this data burst. 

In one embodiment, device 100 uses an initialization mode of operation to insure that 
5 delay control circuit 146 has an opportunity to dynamically program delay circuits 138. As 
indicated by manufacturer data sheets, some DDR SDRAM devices include an initialization 
mode during which the DQ and DQS output signals are not valid, and should be ignored. For 
example, with the MT46 V8M8 DDR SDRAM device, users are required to wait for at least 200 
system clock cycles after issuing a reset command (i.e., a DLLJRST command) before issuing 
10 another command to the device. During this 200 clock cycle initialization period, this 

embodiment of device 100 is configured to toggle the DQ and DQS signals. Although these 
^ signals should be ignored by users, output logic 106 samples the output skew during this period, 
tfi and dynamically generates delay control signals 150 to properly program delay circuits 138 to 
y'S minimize output skew. The programmed delay circuits 138 can then be used to minimize output 
+15 skew of the DQ and DQS signals after the initialization period ends. The programming of delay 
Si circuits 138 may be maintained until a subsequent re-initialization of the DDR SDRAM device, 

7 ' or iiintil another point in time when delay control circuit 146 is able to determine the output skew 

^ for use as feedback. 

In another embodiment, delay control circuit 146 performs dynamic sampling to 
y}20 determine output skew between DQ and DQS signals on any given simultaneous transitions of 
these signals. For example, delay control circuit 146 may determine output skew between any 
DQ signal and the DQS signal whenever that DQ signal and the DQS signal both have a rising 
edge, both have a falling edge, or both have either a rising edge or a falling edge. If, for 
example, delay control circuit 146 only determines output skew on simultaneous rising edges (or 
25 simultaneous falling edges), the same output skew could be used to program the delay circuit 138 
for both the DLLROn and DLLFOn signals. To insure that delay control circuit 146 has an 
opportunity to program delay circuits 138 in this embodiment, user software may be required to 
read appropriate data patterns from device 100 at appropriate times (e.g., periodically during 
operation). This embodiment may be combined with the previously-described embodiment, such 
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that delay circuit programming will occur during initialization, and will then be periodically 
updated during operation. 

Referring to FIG. 3, another exemplary synchronous integrated circuit device 200 in 
accordance with another embodiment of the present invention comprises a clock input circuit 
202:, a clock delay circuit 204, and an output circuit 206. While device 200 is again a DDR 
SDRAM device, the apparatus and methods for improving output skew in device 200 may be 
used in other types of memory devices, or integrated circuit devices. Clock input circuit 202 and 
clock delay circuit 204 have the same structure and operation as clock input circuit 102 and clock 
delay circuit 104, described above. However, while output circuit 106 of device 100 
dynamically programs the delays provided by delay circuits 138 in the output signal paths during 
operation of device 100, output circuit 206 of device 200 is configured to set the amount of delay 
in each output signal path in a static fashion. 

In one embodiment, output circuit 206 includes a plurality of output signal paths for 
oulputting a plurality of output signals synchronously with system clock signal (XCLK) 1 10 by 
using delayed clock signal (XDLL) 1 16 or, as shown in FIG. 3, by using rising-edge and falling- 
edge delayed clock signals (DLLR0/DLLF0) 134. Again, for convenience, it is assumed output 
circuit 206 uses the DLLR0/DLLF0 signals, as shown. In the case of a DDR SDRAM, output 
circuit 206 includes n output signal paths, with the DQS signal being output by one output signal 
path and (n-1) DQ signals output on (n-1) output signal paths. 

Each output signal path includes a variable delay circuit 208 and an output buffer 210. 
Each delay circuit 208 is coupled to clock delay circuit 204 to receive delayed clock signals 
(DLLR0/DLLF0) 134 therefrom. Each delay circuit 208 provides a variable delay to delayed 
clock signals DLLR0/DLLF0 to generate unique delayed clock signals (DLLROn/DLLFOn) 212, 
for use in clocking output signals into nth output buffer 210. For example, in a x8 DDR 
SDRAM device, there are nine (9) output signal paths, with eight (8) used to output the eight (8) 
DQ signals and one used to output the DQS signal. The signals output by output buffers 210 are 
then applied to a respective output pad 214. 

The programming of variable delay circuits 208 is performed statically such that, once 
the programming has been performed, delay circuits 208 provide static delays. In one 
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embodiment, the programming of delay circuits 208 takes place during the manufacturing 
process, during which output skew of device 200 is measured, and used to permanently configure 
delay circuits 208 to add or subtract delay so as to decrease output skew across the output 
signals. To reduce or eliminate output skew, delay circuits 208 may be configured to slow down 
5 each output signal path to match the speed of the slowest (i.e., worst case) output signal path, or 
to slow down or speed up each output signal path to match the speed of a reference output signal 
path (e.g., the DQS output signal path), in a manner similar to that described above for the 
operation of delay control circuit 146. In another embodiment, the intrinsic delay of each output 
signal path is estimated, modeled or measured during the design process for the integrated circuit 
10 device, and delay circuits 208 are each designed to provide an appropriate amount of delay to 

reduce or eliminate output skew. For example, once the signal routing paths on device 200 have 
been designed and are known, the intrinsic delay provided by each signal routing path can be 
J{ determined, and then used to configure delay circuits 208 to provide appropriate amounts of 

delay. Note that, while static programming of delay circuits 208 can be employed to effectively 
J;l5 reduce or eliminate output skew due to static factors, such as internal routing mismatch, such 
rj programming is less likely to be effective to reduce or eliminate output skew due to dynamic 
ffi factors, such as skew due to data patterns and simultaneously switching outputs. 

^ Note that, once the DQ and DQS signals have been de-skewed, the trimmable option 

O fuses that are present in some current DDR SDRAM devices can be configured (e.g., blown) to 

% -20 shift the data window to optimize the access time (i.e., t AC ) of the DDR SDRAM device. The 
H access time (t AC ) of DDR SDRAM devices is defined as the access window of the DQS from the 

clock signal (i.e., the difference in time between a clock edge and the related signal transition that 
access farthest away from that clock edge in time). By de-skewing the DQ and DQS signals of 
device 100, and by trimming device 100, the access time of device 100 can be lowered below the 
25 access times of the current devices. 

In the embodiments of FIGs. 2 and 3, the output signal paths for all of the DQ and DQS 
signals include a programmable delay circuit. Alternatively, in other embodiments, fewer than 
all of the output signal paths include such a programmable delay circuit. For example, in one 
embodiment, a first output signal path (e.g., for the DQS signal) provides a non-programmable 
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amount of delay, and the other output signal paths (e.g., for the DQ signals) include 
programmable delay circuits which are programmed to reduce output skew with respect to the 
first path. The non-programmable amount of delay of the first path may be an intrinsic delay due 
only to internal routing of that first path, or may be due to both internal routing of that first path 
and a fixed delay circuit coupled within that first path. 

Referring to FIG. 4, a simplified block diagram shows one embodiment of a variable 
delay circuit 300 for use in the output signal paths of device 100 or device 200. Delay circuit 300 
may be used to provide a unique delay to delayed clock signal (XDLL) 1 16, or to rising-edge and 
falling-edge delayed clock signals (DLLR0/DLLF0) 134, depending upon the particular 
application. Delay circuit 300 includes an input node 302, a plurality of delay stages 304 
arranged in serial, and an output node 306. Each delay stage 304 includes a delay element, such 
as a pair of inverters 308, and a switching arrangement to selectively switch the delay element 
into and out of the operative circuit. If, for example, each of switches SW1, SW2, SW3, SW4, 
. . . , and SWm is in a first position as shown in FIG. 4, delay circuit 300 will provide a maximum 
amount of delay equal to the sum of the delays provided by each delay element. If, on the other 
hand, the SWm switch is then moved into its second position, then input node 302 will be 
connected directly to output node 306, and delay circuit 300 will provide a minimum amount of 
delay. By selectively controlling switches SW1 through SWm to switch different delay stages 
304 in and out, delay circuit 300 can provide different amounts of delay under the control of the 
switches. 

In the dynamic programming embodiment of device 100, delay control signals 150 
provided by delay control circuit 146 are used to control the states of switches SW1 through 
SWm. In the static programming embodiment of device 200, the states of these switches may be 
permanently set using metal, fuses, antifuses, or other circuit elements. Note that the number of 
delay stages 304 in delay circuit 300, and the amount of delay provided by each delay stage, will 
depend upon the particular application. Generally, by increasing the number of delay stages, and 
decreasing the delay associated with each delay stage, finer resolution can be achieved and the 
amount of output skew can be decreased. Also note that each delay stage can be configured to 
provide a different amount of delay. 
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It should be understood that variable delay circuit 300 shown in FIG. 4 is merely 
illustrative of the many types of variable delay circuits that are known in the art, and many other 
types of variable delay circuits could also be used with the present invention. 

Referring to FIG. 5, an exemplary output data timing diagram for device 100 illustrates 
the dynamic modification of the variable delay circuit 138 for one of the DQ signals in order to 
decrease the output skew between that DQ signal and the DQS signal. In this example, it is 
assumed that the DQS signal has been selected as a reference signal having a midpoint delay, and 
that the timing of the DQ signal will be modified based upon output skew between the DQ signal 
and DQS signal in order to decrease the output skew. 

In response to the first rising edge of the DLLRO signal, the delay circuits 138 for both 
the DQ and DQS signals are assumed to provide a variable delay of ty^, thereby simultaneously 
generating the unique DLLROn signals for the DQ and DQS signals (i.e., DLLR0 D and DLLR0 s , 
respectively). While the DLLR0 D and DLLR0 s signals occur simultaneously, output skew 
introduced in the output signal paths for these two signals causes the rising edge of the DQ signal 
to lead the rising edge of the DQS signal by an amount t DQSQ _ R , which is the output skew between 
these signals. Similarly, in response to the first rising edge of the DLLFO signal, the delay 
circuits 138 for both the DQ and DQS signals are assumed to provide a variable delay of t^, 
thereby simultaneously generating the unique DLLFOn signals for the DQ and DQS signals (i.e., 
DLLF0 D and DLLF0 S5 respectively). While the DLLF0 D and DLLF0 s signals occur 
simultaneously, output skew introduced in the output signal paths for these two signals causes 
the falling edge of the DQ signal to lead the falling edge of the DQS signal by an amount tj^sq-Y 
(i.e., output skew). 

The output skew between the DQ and DQS signals on both the rising and falling edges 
is detected by delay control circuit 146, which then adjusts the variable delay provided by the 
delay circuit 138 of the DQ signal to slow down that DQ signal by an appropriate amount to 
reduce the output skew relative to the reference DQS signal As shown in FIG. 5, in response to 
the second rising edge of the DLLRO signal, the delay circuit 138 for the DQS signal still 
provides the delay t VD1 (which was not adjusted since the DQS signal is acting as a reference), 
but the delay circuit 138 for the DQ signal now provides a delay of t^, thereby generating the 
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unique DLLROn signal for the DQ signal (i.e., DLLR0 D ) only after the unique DLLROn signal 
for the DQS signal (DLLR0 S ). The additional delay provided by the delay circuit 138 for the DQ 
signal now compensates for the timing difference in the output signal paths for these two signals, 
and clauses the rising edge of the DQ signal to be aligned with the rising edge of the DQS signal, 
thereby reducing or eliminating the output skew between these signals. Similarly, in response to 
the second rising edge of the DLLFO signal, the delay circuit 138 for the DQS signal still 
provides the delay t VD2 , but the delay circuit 138 for the DQ signal now provides a delay of t^, 
thereby generating the unique DLLFOn signal for the DQ signal (i.e., DLLF0 D ) only after the 
unique DLLFOn signal for the DQS signal (DLLF0 s ). The additional delay provided by the 
delay circuit 138 for the DQ signal now compensates for the timing difference in the output 
sign al paths for these two signals, and causes the falling edge of the DQ signal to be aligned with 
the falling edge of the DQS signal, thereby reducing or eliminating the output skew between 
these signals. Thus, the DQ and DQS signals have now been aligned, and the output skew 
between these signals has been reduced or eliminated. Note that the timing diagram shown in 
FIG. 5 is merely illustrative, and the actual timing diagram would depend upon the particular 
implementation of the circuits. 

Referring to FIG. 6, a method 400 of decreasing output skew in a synchronous 
integrated circuit device such as device 100 in accordance with one embodiment of the present 
invention is shown. Method 400 includes receiving a system clock signal (at 402), delaying the 
system clock signal to generate a delayed clock signal (at 404), and applying the delayed clock 
signal to a plurality of output signal paths (at 406). In each of the output signal paths, method 
400 also includes providing a programmable delay to the delayed clock signal to generate a 
unique delayed clock signal (at 408), and using the unique delayed clock signal for that output 
signal path to clock out an output signal (at 410). Each programmable delay is provided to 
decrease output skew across the output signals. 

Conclusion 

Thus, an apparatus and method for reducing skew across the output data bus of a DDR 
SDRAM device have been described herein. By reducing output skew, the data eye for the memory 
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controller has been enlarged, and limits on operating speed of the device due to output skew can be 
reduced to allow for faster operation. An apparatus and method for reducing skew across multiple 
data output signals in other memory device types, and across multiple output signals in other 
integrated circuit devices, have also been described. 

The above description is intended to be illustrative, and not restrictive. Many other 
embodiments will be apparent to those of ordinary skill in the art. For example, an apparatus or 
method in accordance with the present invention may be used in other types of memory devices, or 
other integrated circuit devices. Also, different types of input circuits, delay circuits, and output 
circuits may be used. Further, the apparatus and method of the present invention may be configured 
to sample output skew only on the rising edges of the output signals, or only on the falling edges, 
or on both the rising and falling edges. Also, the delays provided by the programmable delay 
circuits may be programmed statically and/or dynamically, and the programmable delay circuits may 
be provided in all or only a portion of the output signal paths. Different types of variable delay 
elements may be used, and may provide different lengths of delays and different resolutions of delay. 
The scope of the present invention should therefore be determined with reference to the appended 
claims, along with the full scope of equivalents to which such claims are entitled. 
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